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Abstract —In a variety of applications, there is a need to 
authenticate content that has experienced legitimate editing in 
addition to potential tampering attacks. We develop one formu¬ 
lation of this problem based on a strict notion of security, and 
characterize and interpret the associated information-theoretic 
performance limits. The results can be viewed as a natural 
generalization of classical approaches to traditional authentica¬ 
tion. Additional insights into the structure of such systems and 
their behavior are obtained by further specializing the results to 
Bernoulli and Gaussian cases. The associated systems are shown 
to be substantially better in terms of performance and/or security 
than commonly advocated approaches based on data hiding and 
digital watermarking. Finally, the formulation is extended to 
obtain efficient layered authentication system constructions. 

Index Terms — coding with side information, data hiding, 
digital signatures, digital watermarking, information embedding, 
joint source-channel coding, multimedia security, robust hashing, 
tamper-proofing, transaction-tracking 

I. Introduction 

I N traditional authentication problems, the goal is to de¬ 
termine whether some content being examined is an exact 
replica of what was created by the author. Digital signature 
techniques [1] are a natural tool for addressing such prob¬ 
lems. In such formulations, the focus on exactness avoids 
consideration of semantic issues. However, in many emerging 
applications, semantic issues are an integral aspect of the 
problem, and cannot be treated separably. As contemporary 
examples, the content of interest may be an audio or video 
waveform, or an image, and before being presented to a 
decoder the waveform may experience any of a variety of 
possible perturbations, including, for example, degradation 
due to noise or compression; transformation by filtering, 
resampling, or transcoding; or editing to annotate, enhance, or 
otherwise modify the waveform. Moreover, such perturbations 
may be intentional or unintentional, benign or malicious, and 
semantically significant or not. Methods for reliable authenti¬ 
cation from such perturbed data are important as well. 

The spectrum of applications where such authentication 
capabilities will be important is enormous, ranging from 
drivers’ licenses, passports, and other government-issued photo 
identication; to news photographs and interview tapes; to 
state-issued currency and other monetary instruments; to legal 
evidence in the form of audio and video recordings in court 
cases. Indeed, the rapidly increasing ease with which such 
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content can be digitally manipulated in sophisticated ways us¬ 
ing inexpensive systems, whether for legitimate or fraudulent 
purposes, is of considerable concern in these applications. 

Arising out of such concerns, a variety of technologies have 
been introduced to facilitate authentication in such settings. 
Examples include various physical watermarking technologies 
— such as hologram imprinting in images — as well as more 
recent digital decendents. See, e.g., [2] for some of the rich 
history in this area going back several hundred years. However, 
regardless of the implementation, all involve the process of 
marking or altering the content in some way, which can be 
viewed as a form of encoding. 

A rather generic problem that encompasses essentially all 
the applications of interest is that of transaction-tracking 
in a content migration scenario. In this scenario, there are 
essentially three types of participants involved in the migration 
of a particular piece of content. There is the original author 
or creator of the content, who delivers an encoding of it. 1 
There is the editor who makes modifications to this encoded 
content, and publishes the result. 2 And there is the reader 
or end-user for whom the published work is intended. The 
reader wants to be able to determine 1) whether published 
work being examined was derived from content originally 
generated by the author, and 2) how it was modified by the 
editor. At the same time, the editor wants the author’s encoding 
to be (semantically) close to the original content, so that the 
modifications can take the semantics into account as necessary. 

In the recent literature, researchers have proposed a vari¬ 
ety of approaches to such problems based on elements of 
digital watermarking, cryptography, and content classification; 
see, e.g., [3]—[ 18] and the references therein. Ultimately, the 
methods developed to date implicitly or explicitly attempt 
to balance the competing goals of robustness to benign per¬ 
turbations, security against tampering attacks, and encoding 
distortion. 

Within this literature, there are two basic types of ap¬ 
proaches. In the first, the authentication mechanism is based 
on embedding what is referred to as a “fragile” watermark 
known to both encoder and decoder into the content of interest. 
At the decoder, a watermark is extracted and compared to 
the known watermark inserted by the encoder. The difference 
between the extracted watermark and the known watermark is 
then interpreted as a measure of authenticity. Examples of this 
basic approach include [5], [7], [13], [14]. 

'There are no inherent restrictions on what can constitute “content” in this 
generic problem. Typical examples include video, audio, imagery, text, and 
various kinds data. 

2 The motives and behavior of the editor naturally depend on the particular 
application and situation. At one extreme the editor might just perform some 
benign resampling or other transcoding, or, at the other extreme, might attempt 
to create a forgery from the content. In the latter case, the editor would be 
considered an attacker. 
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The second type of approach is based on a “robust” wa¬ 
termarking strategy, whereby the important features of the 
content are extracted, compressed and embedded back into 
the content by the encoder. The decoder attempts to extract 
the watermark from the content it obtains and authenticates 
by comparing the features encoded in the watermark to the 
features in the content itself. This strategy is sometimes termed 
“self-embedding.” Examples of this basic approach include 
[4], [11], [15], 

Despite the growing number of proposed systems, many 
basic questions remain about 1) how to best model the problem 
and what we mean by authentication, 2) what the associated 
fundamental performance limits are, and 3) what system struc¬ 
tures can and cannot approach those limits. More generally, 
there are basic questions about the degree to which the 
authentication, digital watermarking, and data hiding problems 
are related or not. 

While information-theoretic treatments of authentication 
problems are just emerging, there has been a growing literature 
in the information theory community on digital watermarking 
and data hiding problems, and more generally problems of 
coding with side information, much of which builds on the 
foundation of [19]—[21]; see, e.g., [22]-[42] and the references 
therein. Collectively, this work provides a useful context within 
which to examine the topic of authentication. 

Our contribution in this paper is to propose one possible 
formulation for the general problem of authentication with a 
semantic model, and examine its implications. In particular, 
using distortion criteria to capture semantic aspects of the 
problem, we assess performance limits in terms of the inherent 
trade-offs between security, robustness, and distortion, and in 
turn develop the structure of systems that make these trade¬ 
offs efficiently. As we will show, these systems have important 
distinguishing characteristics from those proposed to date. We 
also see that under this model, the general authentication 
problem is substantially different from familiar formulations 
of the digital watermarking and data hiding problems, and has 
a correspondingly different solution. 

A detailed outline of the paper is as follows. We begin by 
briefly defining our notation and terminology in Section [H] 
Next in Section II I II we develop a system model and prob¬ 
lem formulation, quantifying a notion of authentication. In 
Section UYI we characterize the performance limits of such 
systems via our main coding theorem. Section El contains 
both the associated achievability proof, which identifies the 
structure of good systems, and a converse. In Section fvTl the 
results are applied to the case of binary content with Hamming 
distortion measures, and in Section IvTTI to Gaussian content 
with quadratic distortion measures. Sec tion lVIIIl then analyzes 
other classes of authentication techniques in the context of 
our framework, and shows that they are inherently either less 
efficient or less secure that the systems developed here. Next, 
Section IIXI generalizes the results of the paper to include 
layered systems that support multiple levels of authentication. 
Finally, Section m contains some concluding remarks. 


II. Notation and Terminology 

We use standard information theory notation (e.g., as found 
in [43]). Specifically, E[A ] denotes expectation of the random 
variable A, H{A), and I(B;C ) denote entropy and mutual 
information, and A <-> B <-> C denotes the Markov condition 
that random variables A and C are independent given B. We 
use the notation vf to denote the sequence {vi, Ui+i,..., Vj}, 
and define v n = v™. Alphabets are denoted by uppercase 
calligraphic letters, e.g., S, X. We use |-| to denote the 
cardinality of a set or alphabet. 

Since the applications are quite varied, we keep our ter¬ 
minology rather generic. The content of interest, as well as 
its various encodings and recontructions, will be generically 
referred to as “signals,” regardless of whether they refer to 
video, audio, imagery, text, data, or any other kind of content. 
The original content we will also sometimes simply refer 
to as the “source.” Moreover, we will generally associate 
any manipulations of the encoded content with the “editor,” 
regardless of whether any human is involved. However, as an 
exception, we will often use the term “attacker” in lieu of 
“editor” for cases where the manipulations are specifically of 
a malicious nature. 

III. System Model and Problem Formulation 

Our system model for the transaction-tracking scenario 
is as depicted in Fig. ^ To simplify the exposition, we 
model the original content as an independent and identically 
distributed (i.i.d.) 3 sequence Si, S 2 , ■ ■ ■, S n . In practice S n 
could correspond to sample values or signal representations in 
some suitable basis. 

The encoder takes as input the block of n source samples 
S n , producing an output X n that is suitably close to S n 
with respect to some distortion measure. The encoder is under 
the control of the content creator. The encoded signal then 
passes through a channel, which models the actions of the 
generic “editor”, and encompasses all processing experienced 
by the encoded signal before it is made available to the end- 
user as Y n . This processing would include all effects ranging 
from routine handling to malicious tampering. The decoder, 
which is controlled by the end-user, either produces, to within 
some fidelity as quantified by a suitable distortion measure, a 
reconstruction S n of the source that is guaranteed to be free 
from the effects of any modifications by the editor, or declares 
that it is not possible to produce such a reconstruction. We 
term such reconstructions “authentic.” 

Our approach to the associated channel modeling issues in 
the formulation of Fig. Q has some novel features, and thus 
warrants special discussion. Indeed, as we now discuss, our 
approach to such modeling is not to anticipate the possible 
behaviors of the editor, but to effectively constrain them. In 
particular, we avoid choosing a model that tries to characterize 
the range of processing the editor might undertake. If we did, 

3 Our results do not depend critically on the i.i.d. property, which is chosen 
for convenience. In fact, the i.i.d. model is sometimes pessimistic; better 
performance can often be obtained by taking advantage of correlation present 
in the source or channel. We believe that qualitatively similar results would 
be obtained in more general settings (e.g., using techniques from [44], [45]). 
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Fig. L Authentication system model. The source S n is encoded by the content creator into X n , incurring some distortion. The channel models the actions of 
the editor, i.e., all processing experienced by the encoded content before it is made available to the end-user. The decoder, controlled by the end-user, produces 
from the channel output Y n either an authentic reconstruction S n of the source to within some fidelity, or indicates that authentication is not possible using 
the special symbol 0. 


the security properties of the resulting system would end up 
being sensitive to any modeling errors, i.e., to any behavior of 
the editor that is inconsistent with the model. 

Instead, the focus is on choosing a model that defines 
the range of processing the editor can undertake and have 
such edits accepted by the end-user. We refer to this as our 
“reference channel model.” Specifically, we effectively design 
the system such the decoder will successfully authenticate 
the modified content if and only if the edits are consistent 
with the reference channel model. Thus, the editor is free 
to edit the content in any way (and we make no attempt to 
model the range of behavior), but the subset of behaviors for 
which the system will authenticate is strictly controlled via 
the reference channel construct. Ultimately, since the end-user 
will not accept content that cannot be authenticated, the editor 
will constrain its behavior according to the reference channel. 

From this perspective, the reference channel model is a 
system design parameter, and thus is known a priori to 
encoders, decoders, and editors. To simplify our analysis, we 
will restrict our attention to memoryless probabilistic reference 
channel models. In this case, the model is characterized by a 
simple conditional distribution p(Y\X). 

As our main result, in Section EH we characterize when 
authentication systems with the above-described behavior are 
possible, and when they are not. Specifically, let D e denote 
the encoding distortion, i.e., the distortion experienced in the 
absence of a channel, and let D r denote the distortion in the 
reconstruction produced by the decoder when the signal can be 
authenticated, i.e., when the channel transformations are con¬ 
sistent with the chosen reference distribution p(y\x). Then we 
determine which distortion pairs (D e ,D r ) are asymptotically 
achievable. 

We emphasize that the distortion pair (D e , D T ) corresponds 
precisely to the performance characteristics of direct interest 
in the system for the transaction-tracking scenario. Indeed, 
a small D e means the editor is given work with a faithful 
version of the original content. Moreover, a small D, means 
that the end-user is able to accurately estimate the editor’s 
modifications by comparing the decoder input to the authentic 
reconstruction. 

A. Defining “Authenticity” 

To develop our main results, we first need to quantify 
the concept of an “authentic reconstruction.” Recall that our 
intuitive notion of an authentic reconstruction is one that is 
free from the effects of the edits when the reference channel 
is in effect. Formally, this is naturally expressed as follows. 


Definition 1: A reconstruction S n produced by the decoder 
from the output Y n of the reference channel is said to be 
authentic if it satisfies the Markov condition below: 

S n {S n ,X n } <-> Y n (1) 

Note that as special cases, this definition would include 
systems in which, for example, S n is a deterministic or 
randomized function of S n . More generally, this definition 
means that the authentic reconstructions are effectively defined 
by the encoder in such systems. This will have implications 
later in the system design. 

B. An Example Distortion Region 

Before developing our main result, we illustrate with an 
example the kinds of results that will be obtained. This exam¬ 
ple corresponds to a problem involving a symmetric Bernoulli 
source, Hamming distortion measures, and a (memoryless) 
binary symmetric reference channel with crossover probability 

fo¬ 
under this example scenario, the editor is allowed to flip a 
fraction p of the binary source samples, and the end-user must 
(almost certainly) be able to generate an authentic reconstruc¬ 
tion from such a perturbation. If the edits are generated from 
a different distribution, such as a binary symmetric channel 
with a cross-over probability greater than p, then the decoder 
must (almost certainly) declare an authentication failure. 

The corresponding achievable distortion region is depicted 
in Fig. |3 Several points on the frontier are worth discussing. 
First, note that the upper left point on the frontier, i.e., 
( D e ,D r ) = (0,1/2), reflects that if no encoding distortion 
is allowed, then authentic reconstructions are not possible, 
since the maximum possible distortion is incurred. At the other 
extreme, the lower right point of the frontier, i.e., ( D e , I) r ) = 
(1/2, p), corresponds to a system in which the source is first 
source coded to distortion p, afterwhich the resulting bits are 
digitally signed and channel coded for the BSC. 

While no amount of encoding distortion can reduce the 
reconstruction distortion below p, the point ( D e ,D r ) = (p,p) 
on the frontier establishes that a reconstruction distortion of p 
is actually achievable with much less encoding distortion than 
the lower right point suggests. In fact, because the required 
encoding distortion is only p, the decoder can be viewed as 
completely eliminating the effects of the reference channel 
when it is in effect: the minimum achievable reconstruction 
distortion D r is the same as the distortion D e at the output of 
the encoder. 

The more general structure of the frontier is also worth 
observing. In particular, D Y is a decreasing function of D e 
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Fig. 2. The shaded area depicts the achievable distortion region for a 
symmetric Bernoulli source used in conjunction with a binary symmetric 
reference channel of crossover probability p. Distortions are with respect to 
the Hamming measure. The case p = 0 corresponds to traditional digital 
signatures. If authentication was not required, the point ( D e = 0, D r = p) 
could be achieved. 

along the frontier. This reflects that the objectives of small D e 
(which the editor wants) and a small I) r (which the end-user 
wants) are conflicting and a fundamental tradeoff is involved 
for any given reference channel. In fact, as we will see in the 
sequel, this behavior is not specific to this example, but a more 
general feature of our authentication problem formulation. 4 

Finally, observe that the achievable region decreases mono- 
tonically with p, the severity of edits allowed. Thus, if one 
has particular target encoding and reconstruction distortions, 
then this effectively limits how much editing can be tolerated. 
As the extreme point, the case p = 0 in which no editing 
is allowed corresponds to the traditional scenario for digital 
signatures. In this case, as the figure reflects, authentication 
is achievable without incurring any encoding distortion nor 
reconstruction distortion. It is worth noting that the nature of 
the interplay between the severity of the reference channel 
and the achievable distortion region is not specific to this 
example, but arises more generally with this formulation of 
the authentication problem. 

IV. Characterization of Solution: Coding 
Theorems 

An instance of the authentication problem consists of the 
seven-tuple 

{^,p(s),X,^,p(y\x),d e (-,-),d T . (2) 

We use S to denote the source alphabet—which is finite unless 
otherwise indicated—and p(s) is its (i.i.d.) distribution. The 

4 This should not be surprising, since such tradeoffs frequently arise in joint 
source-channel coding problems with uncertain channels; see, e.g., [46]—[48]. 


channel input and output alphabets are X and and p(y\x) 
is the (memoryless) reference channel law. Finally, d e (-,-) 
and d T (-, •) are the encoding and reconstruction distortion 
measures. 

A solution to this problem (i.e., an authentication scheme) 
consists of an algorithm that returns an encoding function 
T„, a decoding function d> n , and a secret key 9. The secret 
key is shared only between the encoder and decoder; all 
other information is known to all parties including editors. 
(For the interested reader, straightforward adaptations of our 
solutions to public-key implementations are summarized in 
the Appendix. However, we otherwise restrict our attention 
to private-key schemes in the paper to focus the exposition.) 

The secret key 6 is a A;-bit sequence with k sufficiently large. 
The encoder is a mapping from the source sequence and the 
secret key to codewords, i.e., 

T n (S n ,9): S n x {0, l} k h-> X n . 

The decoder is a mapping from the channel output and the 
secret key to either an authentic source reconstruction S n (i.e., 
one satisfying Q) or the special symbol 0 that indicates such 
a reconstruction is not possible; whence, 

$„{Y n ,6): T x {0, l} fc i-> S n U{0}. 

Notice that since an authentic reconstruction must satisfy o. 
and since the decoder must satisfy the Markov condition 
{S n ,X n } <-► Y n <-► <f> n (Y n ,0), we have that S n <-► 
{S n ,X ra } <-> forms a Markov chain only when 

successful decoding occurs. Thus, the authentic reconstruction 
S n should be defined as a quantity that the decoder attempts 
to deduce since defining S n = ‘1>„ ( Y,9 n ) will generally not 
satisfy 0 . 

Henceforth, except when there is risk of confusion, we omit 
both the subscript n and the secret key argument from the 
encoding and decoding function notation, letting the depen¬ 
dence be implicit. Moreover, when the encoder and/or decoder 
are randomized functions, then all probabilities are taken over 
these randomizations as well as the source and channel law. 

The relevant distortions are the encoding and decoding 
distortion computed as the sum of the respective (bounded) 
single letter distortion functions d e and d T , i.e., 

-i n -\ n 

-YrfejSi.Xi) and 
n n 

i =1 i =i 

Evidently, 

4: §xl^l+ (3) 

d r : § x § i—♦ R + . (4) 

The system can fail in one of three ways. The first two 
failure modes correspond to either the encoder introducing 
excessive encoding distortion, or the decoder failing to produce 
an authentic reconstruction with acceptable distortion when 
the reference channel is in effect. Accordingly, we define the 
overall distortion violation error event to be 

£dv = U £r> r 


(5) 
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where, for any e > 0, 

£d 6 = | ^ it X >) > D e + e| (6) 

£d, = {$„ (Y n ) = 0 J 

U (F n )) > Dr + ej 

n|ch„(F")^0|. (7) 

In the remaining failure mode, the system fails to produce 
the desired authentic reconstruction 6 from the channel 
output and instead of declaring that authentication is not 
possible produces an incorrect estimate. Specifically, we define 
the successful attack event according to 

£ sa = {$ (y”) ^ 0 } n {$ (F”) ^ S"}. (8) 

Definition 2: The achievable distortion region for the prob¬ 
lem 0 is the closure of the set of pairs (D e ,D r ) such that 
there exists a sequence of authentication systems, indexed by 
n , where for every e > 0 and as n —> oo, Pr[£ sa ] —> 0 
regardless of the channel law in effect, Pr[£^ e ] —> 0, and 
Pr[£D r ] —► 0 when the reference channel is in effect, with 
£ sa , £u e , and £u r as defined in 0. 0, and 0. 

For such systems, we have the following coding theorem: 

Theorem 1: The distortion pair (D e , D r ) lies in the achiev¬ 
able distortion region for the problem 0 if and only 
if there exist functions /(•,•), g(-) and a distribution 
p(y,x,u,s) = p(s)p(u\s)p(x\u, s)p(y\x) with X determin- 


istic (i.e. p(x\u, s) = 1 x =f( s ,u)) such that 


I{U ; Y) - I(S] U) > 0 

(9a) 

E[d e (S,f(U,S))] < D e 

(9b) 

Eld^giU))] < D t . 

(9c) 


The alphabet U of the auxiliary random variable U requires 
cardinality |TC| < (|S| + |X| + 3) • |S| • |X|. 5 

Essentially, the auxiliary random variable U represents an 
embedded description of the source that can be authenticated, 
X represents the encoding of the source S , and g(U) in d9cl 
represents the authentic reconstruction. The usual condition 
that the channel output is determined from the channel input 
(i.e., the encoder does not know what the channel output will 
be until after the channel input is fixed) is captured by the 
requirement that the full joint distribution p(y, x, u, s ) factors 
as shown above. The requirement 0 that the authentic recon¬ 
struction does not depend directly on the editors manipulations 
— i.e., the realization of the reference channel — is captured 
by the fact that <?(•) depends only on U and not on Y. Without 
the authentication requirement, the set of achievable distortion 
pairs can be enlarged by allowing the reconstruction to depend 
on the channel output, i.e. g(U) in (19ct can be replaced by 
g(U, Y). Thus, as we shall see in Sections IVTI and lvnl security 
comes at a price in this problem. 

5 If instead f(U , S) is allowed to be a non-deterministic mapping, then it 
is sufficient to consider distributions where the auxiliary random variable has 
the smaller alphabet |U| < |S| + |X| + 3. 


Theorem [0 has some interesting features. First, it is worth 
noting that since the problem formulation is inherently “ana¬ 
log,” dealing only with waveforms, we might expect the best 
solutions to the problem to be analog in nature. However, 
what the theorem suggests, and what its proof confirms, is that 
digital solutions are in fact sufficient to achieve optimality. In 
particular, as we will see, source and channel coding based 
on discrete codebooks are key ingredients of the achievability 
argument. In some sense, this is the consequence of the inher¬ 
ently discrete functionality we have required of the decoder 
with our formulation. 

As a second remark, note that Theorem 0 can be con¬ 
trasted with its information embedding counterpart, which 
as generalized from [19] in [36], states that a pair ( R,D e ), 
where R is the embedding rate, is achievable if and 
only if there exists a function /(•,•) and a distribution 
p(y,x,u,s) = p(s)p(u\s)p(x\s,u)p(y\x) with X determin¬ 
istic (i.e. p(x\u,s) = 1 x =/(s,u)) suc h that 

I(U ; Y) — /(S'; U) > R (10a) 

E{d e (SJ(U,S))}<D e . (10b) 

Thus we see that the authentication problem is substantially 
different from the information embedding problem. 

Before developing the proofs of Theorem [2 to develop 
intuition we describe the general system structure, and its 
specialization to the Gaussian-quadratic case. 

A. General System Structure 

As developed in detail in Section El an optimal authentica¬ 
tion system can be constructed by choosing a codebook C with 
codewords appropriately distributed over the space of possible 
source outcomes. The elements of a randomly chosen subset 
of these codewords A C C are marked as admissible and the 
knowledge of A is a secret shared between the encoder and 
decoder, and kept from editors. 

The encoder maps (quantizes) the source S n to the nearest 
admissible codeword U n and then generates the channel input 
X n from U n . The decoder maps the signal it obtains to the 
nearest codeword C n £ 6. If C n £ A, i.e., C n is an admissible 
codeword, the decoder produces the reconstruction S n from 
C n . If C n (fi A , i.e., C n is not admissible, the decoder declares 
that an authentic reconstruction is not possible. 

Observe that the A must have the following three char¬ 
acteristics. First, to avoid a successful attack the number of 
admissible codewords must be appropriately small. Indeed, 
since attackers do not know A, if an attacker’s tampering 
causes the decoder to decode to any codeword other than 
U n then the probability that the decoder is fooled by the 
tampering and does not declare a decoding failure is bounded 
by \A\ / |C|. Second, to avoid an encoding distortion violation, 
the set of admissible codewords should be dense enough to 
allow the encoder to find an appropriate X n near S n . Third, to 
avoid a reconstruction distortion violation, the decoder should 
be able to distinguish the possible encoded signals at the 
output of the reference channel. Thus the codewords should be 
sufficiently separated that they can be resolved at the output 
of the reference channel. 
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1) Geometry for Gaussian-Quadratic Example: We illus¬ 
trate the system geometry in the case of a white Gaussian 
source, quadratic distortion measure, and an additive white 
Gaussian noise reference channel, in the high signal-to-noise 
ratio (SNR) regime. We let <j| and o 2 N denote the source 
and channel variances, respectively. For this example, we can 
construct C by packing codewords into the space of possible 
source vectors such that no codeword is closer than some 
distance ryfn to any other, i.e., packing spheres of radius 
ry/n into a sphere of radius ogy/n where the center of the 
spheres correspond to codewords. Next, a fraction 2 _ra7 of 
the codewords in 6 are chosen at random and marked as 
admissible to form A. It suffices to let 7 = 1 j y/n and 
r 2 = 0 % + e for some e > 0 that is arbitrarily small. This 
construction is illustrated in Fig. 0 

The encoder maps the source S n to a nearby admissible 
codeword U n , which it chooses as the encoding X. Since 
the number of admissible codewords in a sphere of radius d 
centered on S n is roughly 

M (±Y 

|e| ' w ’ 

on average there exists at least one codeword within distance 
d of the source provided d > r2 7 . Thus, the average encoding 
distortion is roughly r 2 2 2t , which approaches o 2 N + easn-» 
00 . 

The authentic reconstruction is S n = U n . Thus, when the 
decoder correctly identifies U n , the reconstruction distortion 
is the same as the encoding distortion. And when the reference 
channel is in effect, the decoder does indeed correctly identify 
U n . This follows from the fact that with high probability, the 
reference channel noise creates a perturbation within a noise 
sphere of radius ONy/n about the encoding X n , and the noise 
spheres do not intersect since r > on- 

Furthermore, when the reference channel is not in effect 
and an attacker tampers with the signal such that the nearest 
codeword C is different from that chosen by the encoder U n , 
then the probability that C was marked as admissible in the 
codebook construction phase is 

Pr [C £ A\C ± U n ] = M = 2 - " 7 , 

which goes to zero as n —> 00 . The decoder generates 0 if it 
decodes to a non-admissible codeword, so the probability of 
a nonauthentic reconstruction is vanishingly small. 

Thus the distortions D e = D r = o 2 N can be approached 
with an arbitrarily small probability of successful attack. See 
[49], [50] for insights into the practical implementation of this 
class of systems including those designed based on a public 
key instead of a secret key. 

V. Proofs 

A. Forward Part: Sufficiency 

Here we show that if there exist distributions and functions 
satisfying ©, then for every e > 0 there exists a sequence of 
authentication system with distortion at most ( D e + e,D r + 
e). Since the achievable distortion region is a closed set this 
implies that ( D e ,D r ) lies in the achievable distortion region. 


We prove this forward part of Theorem 0 by showing the 
existence of a random code with the desired properties. 

1) Codebook Generation: We begin by choosing some 7 > 
0 such that 

I(Y;U)- I(U;S) >3 7 . (ID 

where 7 decays to zero more slowly than 1/n, i.e., 

7 —> 0 and 717 —* 00 as n —> 00 . ( 12 ) 

Given the choice of 7 , the encoder chooses a random codebook 
6 of rate 

R = I(S; U) + 2 7 . (13) 

Each codeword in C is a sequence of 2 nR i.i.d. random 
variables selected according to the distribution p(u) = 
Sses p(u\s)p(s). Then, for each realized codebook C the 
encoder randomly marks T l(R -X) 0 f th e codewords in 6 as 
admissible and the others as forbidden. We denote this new 
codebook of admissible codewords as A, which has effective 
rate 

R' = R — 'Y = I(S; U) + 7 , (14) 

where the last equality follows from substituting ®. The 
knowledge of which codewords are forbidden is the secret 
key and is revealed only to the decoder. The codebook C is 
publicly revealed. 

2) Encoding and Decoding: The encoder first tries to find 
an admissible codeword u n £ A that is d-strongly jointly 
typical with its source sequence S n according to p(u|s). If 
the codeword u n £ A is found to be typical, the encoder 
output is produced by mapping the pair ( s n ,u n ) into x n via 
x = f(s,u). If no jointly typical admissible codeword exists, 
the encoder expects the system to fail, and thus selects an 
arbitrary codeword. 

The decoder attempts to produce the authentic reconstruc¬ 
tion s n = g n (u n ) where 

g n (u n ) = (g{ui},g(u 2 ),...,g(u n )). (15) 

The decoder $ (•) tries to deduce s n by searching for a unique 
admissible codeword u n £ A that is <5-strongly jointly typical 
with the obtained sequence Y n . If such a codeword is found 
the reconstruction produced is g n (u n ). If no such unique 
codeword is found, the decoder produces the output symbol 
0 . 

3) System Failure Probabilities: We begin by analyzing the 
system failure probabilities. 

a) Probability of Successful Attack.: Suppose the at¬ 
tacker causes the codeword obtained by the decoder to be 
jointly typical with a unique codeword c n £ C. Since the 
attacker has no knowledge of which codewords are admissible, 
the probability that codeword c" was chosen as admissible in 
the codebook construction phase is 

I t\ I onR 1 

p r[c" £ A] = U = __ = 2 - " 7 . 

where we have used O and O- Therefore, 

Pr[£ sa ] < Pr[$ (F n ) ^ 0 | $ ( Y n ) ^ S n ] = 2“ n7 . 
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Fig. 3. Codebook construction for the Gaussian-quadratic scenario. The large sphere represents the space of possible source vectors and the small spheres 
representing the noise are centered on codewords. When the small spheres do not overlap, the codewords can be resolved at the output of the reference 
channel. The shaded spheres represent the admissible codewords—a secret known only to the encoder and decoder. 


which goes to zero according to O- Note that this argument 
applies regardless of the method used by the attacker since 
without access to the secret key its actions are statistically 
independent of which codewords are admissible. 

b) Probability of Distortion Violation.: The distortion 
violation events £ jj e and £ defined in © and 0 can arise 
due to any of the following typicality failure events: 

• £ s t: The source is not typical. 

• £ e t: The encoder fails to find an admissible codeword 
that is jointly typical with its input. 

• £ c t: The channel fails to produce an output jointly typical 
with its input when the reference channel law is in effect. 

• £dt: The decoder fails to find a codeword jointly typical 
with its input when the reference channel law is in effect. 

A distortion violation event can also occur if there is no 
typicality failure but the distortion is still too high. Letting 

£tf = £st u £et u £ct u £ dt (16) 

denote the typicality failure event, we have then that the 
probability of a distortion violation can be expressed as 

Pr[£ dv ] = Pr[£ dv | £ tf ] • Pr[£ tf ] + Pr[£ dv | £' f ] • Pr[££ f ] 

< Pr[£ dv | £«] + Pr[£ tf ] 

= Pr[£ dv |£ t c f ]+Pr[£ st ]+Pr[£ et |£^ t ] 

+ Pr[£ ct | £g t , £e t ] + Pr[£ d t | £‘ t , £‘ t , £^]. (17) 

First, according to well-known properties of typical se¬ 
quences [43], by choosing n large enough we can make 

Pr[£ s t] < e/4 ( 18 ) 

Pr[£ ct | £g t ,£g t ] < e/4. (19) 

Second, provided that the source is typical, the probability 
that the encoder fails to find a sequence u n £ A jointly typical 


with the source follows from m as 

Pr[£ et | £g t ] < 2 ~ n ^ R '- I(S ' u ^ = 2 _ri7 ( 20 ) 

from standard joint typicality arguments. 

Third, 

Pr[£ dt | £g t , £g t , £p t ] < 2 _n7 + e/4. (21) 

Indeed, using standard joint typicality results, the probability 
that the sequence Y n presented to the decoder is not <5-strongly 
jointly typical with the correct codeword TJ n selected by the 
encoder can be made smaller than e/4 for n large enough, 
and the probability of it being strongly jointly typical with 
any other admissible codeword is, using G3J with O. at 
most 

2 —n[I(U\Y) — R] ^ 2~ n 7 

Fourth, 

Pr [£ dv | £ t c f ] = 0. (22) 

Indeed, provided there are no typicality failures, the pair 
( S n ,Y n ) must be strongly jointly typical, so by the standard 
properties of strong joint typicality, 

n 

- y d e (S t , Xi) < E[d e (S, X )] + <5 • Jr 

n z ' 

i= 1 

1 n 

- V dtiSi, 9 i{Ui)) < E[dr(S,g{U))\ +S-d 2 , 

n z ' 

i =1 

where d\ and d ,2 are bounds defined via 


d\ = 

sup d e (s,x) 

(23) 


(s,x)GS X X 


O-l 

to 

II 

sup d r (s,s). 

(24) 


(s,s)G§xS 








IEEE TRANS. INFORM. THEORY, VOL. X, NO. XX, 2005 


Thus, choosing <5 such that 


As demonstrated by the following Lemma, a suitable [/,; is 


S < max I 

\di di 

and making n large enough we obtain 

Finally, using m ED- EB. CD- and <221 in <17> we 
obtain 

Pr[£ dv ] < 3e/4 + 2 • 2~ n ^ (25) 

which can be made less than e for n large enough. Thus 
Pr[£.De] ~> 0 and, when the reference channel is in effect, 
Pr[£ Dr ] -> 0. 


B. Converse Part: Necessity 

Here we show that if there exists an authentication system 
where the pair (A, A) is in the achievable distortion region, 
then there exists a distribution p('u\s) and functions g(-), /(-, •) 
satisfying 0 . In order to apply previously developed tools, it 
is convenient to define the rate-function 


U i = (S n ,Yr\S? +1 ). 


(29) 


Lemma 2: The choice of U, in <29> satisfies the Markov 
relationship 

Yi •<-*• (Si,Xi) Ui. (30) 

Proof: It suffices to note that 


p{yi\xi,Si) =p(yi\xi) = 


p(y\\x n ) p(y\\x n , s r ‘ 


,i- 1| 


p(y\\x n ,s n 


p{y i ‘Ia; 


p(y l i 


i ~ 1 'x n ,s n ) 


p(y \ 1 | x n ,s n ,s n ) 


(31) 

= p(y i \x n ,s n ,s n ,y\- 1 ) 

(32) 


where the equalities in d3 II follow from the memoryless 
channel model, and the first equality in <32> follows from the 
fact that the system generates authentic reconstructions so 0 
holds. Thus, d32l implies the Markov relationship 


R*(D e , A) = 

sup I(U ; Y) — I(S] U). (26) 

p(U\S),f :UxShI, 9 :UhS 
: E[d B {S,f(U,S))] < D„,E[d t (S,g(U))] < D r 


Note that R*(D e , A) > 0 if and only if the conditions in 0 
are satisfied. Thus our strategy is to assume that the sequence 
of encoding and decoding functions discussed in Section EH 
exist with lim^oo Pr[£ sa ] = 0, lim^oo Pr[£ De ] = 0, and — 
when the reference channel is in effect—linin^oo Pr [£ ] = 
0. We then show that these functions imply that R*(D e , A) > 
0 and hence 0 is satisfied. 

To begin we note that it suffices to choose g(-) to be the 
minimum distortion estimator of S given U. Next, by using 
techniques from [19] or by directly applying [36, Lemma 2] 
it is possible to prove that allowing X to be non-deterministic 
has no advantage, i.e., 

R*(D e ,D T ) > 

sup I(U ; Y) — I(S; U). (27) 

p(t/|S),p(.Y|t/,S) : 

E[d e (S,X)] < D e ,E[d r (S,g(U))] < D t 

Arguments similar to those in [19] and [36, Lemma 1] show 
that R*(D e , l) r ) is monotonically non-decreasing and concave 
in ( D e . I) r ). These properties will later allow us to make use 
of the following lemma, whose proof follows readily from that 
of Lemma 4 in [19]: 

Lemma 1: For arbitrary random variables 
V, A\, A 2 ,..., A n and a sequence of i.i.d. random variables 


Yi ~ (X it S t ) <- {XlX? +1 ,Si,S? +1 ,Yt\S n ), (33) 

which by deleting selected terms from the right hand side 
yields (l30l . ■ 

Next, we combine these results to prove the converse part 
of Theorem [2 except for the cardinality bound on 'll which is 
derived immediately thereafter. 

Lemma 3: If a sequence of encoding and decoding func¬ 
tions T„(-) and <1>„ (•) exist such that the decoder can gen¬ 
erate authentic reconstructions achieving the distortion pair 
( D e , I) r ) when the reference channel is in effect then 


R*(D e ,D r )> 0. (34) 

Proof: Define Z7 e j and I) r l as the component-wise 
distortions between Si and X r and between Si and S t . We 
have the following chain of inequalities: 


n n 

R*(D e , D r ) = R* - V Am, - Y, 


> ~Y i?*(A,i,A,t) 


1 " 

> -YiWiM-nunSi)] 

n 

i —1 

> - \l(S n ;Y n ) - I(S n -,S n ) 

n L 

= - \H(S n \S n )~ H{S n \Y n ) 

n J 

> ——H(S n \Y n ) 

n 

>_I — Pr [$„ [Y n )^S n ] log |§|. 
n 


(35) 

(36) 

(37) 

(38) 

(39) 

(40) 

(41) 


E 4 - 1 , S? + 1 ; Ai) - I(V, Air 1 , Si+ 1 ; Si)] 

i =1 

>I{V-,A n )-I(V;S n ). (28) 


The concavity of R*(D ei Df) yields (1361 . To obtain d37> . we 
combine Lemma|3with \21\ . Next, to obtain <38> . let V = S n 
and Ai = Y t to apply Lemma Q with (/, chosen according to 
(1291 . Fano’s inequality yields (ED- 
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Finally, using (in order) Bayes’ law, @, and 0 , we obtain 6) Let 


Pr[$„ ( Y n ) £ S n ] = Pr[£ sa ] 

+ Pr[{$„ (: Y n ) ± 5"} n {$„ (Y n ) = 0 }] (42) 

< Pr[£ sa ] + Pr[{$„ (Y n ) = 0 }] (43) 

<Pr[£ sa ]+Pr[£ Dr ], (44) 

Therefore exploiting that the system generates an authentic 
reconstruction (lim.n^oo Pr[£ sa ] = 0) of the right distortion 
(linin^oo Pr [£] = 0) and that the alphabet of S is finite, 
we have that GD and SB imply (H. ■ 

The following proposition bounds the cardinality of 'll. 

Proposition 1: Any point in the achievable distortion region 
defined by 0 can be attained with U distributed over an 
alphabet 'll of cardinality at most (|S| + |X| +3) • |S| • |X| 
with p(x\u, s) singular or over an alphabet 'll of cardinality at 
most |§| + |X| + 3 if p(x\u, s ) is not required to be singular. 

Proof: This can be proved using standard tools from 
convex set theory. Essentially, we define a convex set of 
continuous functions fj(p) where p represents a distribution 
of the form Pr(5 = s, X = x\U = u) and the /.,-(•) functions 
capture the features of the distributions relevant to 0 . Accord¬ 
ing to Caratheodory’s Theorem [43, Theorem 14.3.4], [51], 
there exist j max +1 distributions pi through p_, max + i such that 
any vector of function values, (/i(p'), / 2 (p')> ■ ■ •, / Jmax (p'))> 
achieved by some distribution p' can be achieved with a con¬ 
vex combination of the p, distributions. Since each distribution 
corresponds to a particular choice for U, at most j max + 1 
possible values are required for U. Specifically, the desired 
cardinality bound for our problem can be proved by making 
the following syntactical modifications to the argument in [52, 
bottom left of p. 634]: 

1) Replace Pr(X = x\U = u) with Pr(S' = s, X = x \ 
U = u) which is represented by the notation p. 

2) Choose 


m(s, u, x, y) = 

Pr(Y = y | X = x) Pr(S' = s, X = x \ U = u) 
and choose 


/„ +4 (p) = 


(El EE m ( s > u ’ X, y) 

\ X s 

EE lo « 


; mis, u, x. 


y) 


■ (49) 


7) Choose 

f n+5+j (p) = ^ Pr(5 = s, X = j | U = u) (50) 


for j £ {1,2, ...,|X|}. 

Since the fj(p') determine Pr[S = s] (and therefore H(S ) 
as well), D e , D r , H(S\U), H(Y\U), and Pr[,Y = a:] (and 
therefore Pr[Y = y] and H(Y ) also), they can be used to 
identify all points in the distortion region. According to [52, 
Lemma 3], for every point in this region obtained over the 
alphabet U there exists a U* from alphabet If* with cardinality 
| U* | at most one greater than the dimension of the space 
spanned by the vectors fi. The /, corresponding to LrfS = s] 
and Pr[XT = a;] contribute |S| — 1 and |X| — 1 dimensions while 
the other /, contribute four more dimensions. Thus it suffices 
to choose |U*| < |X| + |§| + 3. Note that this cardinality 
bound applies to the general case where X is not necessarily 
a deterministic function of S and U*. 

By directly applying [36, Lemma 2] to each pair ( u *, s) in 
'll* x §, we can split each u* into |X| new symbols u** such 
that the mapping from (ti**, s) to x is deterministic. The new 
auxiliary random variable U** takes values over the alphabet 
'll** where 


fj (p) = E] Pr ( s ' = j,X = x\U = u) (45) 

X 

for j £ { 1,2,..., n} where n = |S|. 

3) Choose 

fn+ 1(P) = 

EZ d e {x, s) Pr(5' = s, A' = x \ U = u). (46) 

S X 

4) Choose 


|lf**| = |U*| ■ |S| • |X| = (|X| + |S| + 3) • |S| • |X|. (51) 

Furthermore, this process does not change the distortion or 
violate the mutual information constraint. Thus a deterministic 
mapping from the source and auxiliary random variable to the 
channel input can be found with no loss of optimality provided 
a potentially larger alphabet is allowed for the auxiliary 
random variable. ■ 

We next apply Theorem 0 to two example scenarios of 
interest—one discrete and one continuous. 


fn+ 2(P) = 

EZ EZ d r(ff(w), s) Pr (s = s,x = x\U = u). (47) 


5) Choose 


f n+3 (p) = EZ 


E3 Pr(5 l = s, X = x \ U = u)- 


log ( EZ Pr (^ = s, X = x | U = u) 


. (48) 


VI. Example: the Binary-Hamming Scenario 

In some applications of authentication, the content of inter¬ 
est is inherently discrete. For example, we might be interested 
in authenticating a passage of text, some of whose characters 
may have been altered in a benign manner through errors 
in optical character recognition process or error-prone human 
transcription during scanning. Or the alterations might be by 
the hand of human editor whose job it is to correct, refine, 
or otherwise enhance the exposition in preparation for its 
publication in a paper, journal, magazine, or book. Or the 
alternations may be the result of an attacker deliberately 
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tampering with the text for the purpose of distorting its 
meaning and affecting how it will be interpreted. 

As perhaps the simplest model representative of such dis¬ 
crete problems, we now consider a symmetric binary source 
with a binary symmetric reference channel. Specifically, we 
model the source as an i.i.d. sequence where each Si is a 
Bernoulli 1/2) random variable 6 and the reference channel 
output is Yi = X, ® Nu where ® denotes modulo-2 addition 
and where N n is an i.i.d. sequence of Bernoulli(p) random 
variables. Finally, we adopt the Hamming distortion measure: 


d(a, b) 


0 , if a = b 
1 , otherwise . 


For this problem, a suitable auxiliary random variable is 


U = {S'® {A ■ T) ffi [(1 — A) ■ V]} + 2 • (1 — A), (52) 


where A, T, and V are Bernoulli a, r, and v random variables, 
respectively, and are independent of each other and S and 
N. Without loss of generality, the parameters r and v are 
restricted to the range (0,1/2). Note that ll = {0,1,2,3}. 

The encoder function X = f(S, U) is, in turn, given by 


X = 



if Ue { 0 , 1 } 
if U € {2,3}, 


(53) 


from which it is straightforward to verify via J52I that the 
encoding distortion is 


D e = ar. (54) 

The corresponding decoder function S = g(U ) takes the 
form 

S = U mod 2, (55) 

from which it is straightforward to verify via <1521 that the 
reconstruction distortion is 


D r = ar + (1 — a)v. (56) 

In addition, I(U;S ) takes the form 
I{U\ S) = H(S) — H(S\U) 

= H(S) - H(S, A\U) + H(A\U, S) 

= H(S) - H(S\U, A) - H(A\U) + H(A\U , S) 

= 1 — a ■ h(r) — (1 — a) ■ h(u), (57) 

where the second and third equalities follow from the entropy 
chain rule, where the last two terms on the third line are 
zero because knowing U determines A , and where the last 
equality follows from <E3, with h(-) denoting the binary 
entropy function, i.e., h(q) = —qlogq — (1 — g)log(l — q) 
for 0 < q < 1. Similarly, /((/; Y) takes the form 

I(U-,Y) = H(Y) - H(Y\U) 

= H(Y) - H(Y,A\U) + H(A\U,Y) 

= H(Y) - H(Y\U, A) - H(A\U) + H(A\U,Y) 

(58) 

= 1 — a h(p) — (1 — a)h (p(l — v) + (1 — p)u). 

(59) 


6 We adopt the convention that all Bernoulli random variables take values 
in the set {0,1}. 



Fig. 4. The solid curve represents the frontier of the achievable distortion 
region for a binary symmetric source and a binary symmetric reference 
channel with cross-over probability p = 0.2. This plot reflects the system 
behavior when the reference channel is in effect. The dashed line represents 
the boundary of the larger distortion region achievable when authentication is 
not required. 


For a fixed p , varying the parameters a, u, and r such that 
J59l is at least as big as (1571 as required by (l9al generates the 
achievable distortion region shown in Fig. 0 Note from <j59j, 
(|57), O and ED that the boundary point D e = D r = p, 
in particular, is obtained by the parameter values a = 1 and 
t = p (with any choice of v). Numerical optimization over 
all p(u\s) and all (not necessarily singular) p(x\s,u) with the 
alphabet size |U| = 7 chosen in accordance with PropositionU 
confirms that Fig. 0 captures all achievable distortion pairs. 

For comparison, we can also develop the achievable dis¬ 
tortion region when authentication is not required. In this 
setting the goal is to provide a representation of the source 
which allows a decoder to obtain a good reconstruction from 
the reference channel output while keeping the encoding 
distortion small. Although in general hybrid analog-digital 
coding schemes can be used [36], optimality can also be 
achieved without any coding in the binary-Hamming case and 
thus all points in the region D e > 0 and l) r > p are achievable, 
as also shown in Fig. 0 Thus we see that the requirement that 
reconstructions be authentic strictly decreases the achievable 
distortion region as shown in Fig 0 

VII. Example: the Gaussian-Quadratic Scenario 

In some other applications of authentication, the content 
of interest is inherently continuous. Examples involve sources 
such as imagery, video, or audio. In addition to tampering 
attacks, such content may encounter degradations as a result 
of routine handling that includes compression, transcoding, 
resampling, printing, and scanning, as well as perturbations 
from editing to enhance the content. 

As perhaps the simplest model representative of such con¬ 
tinuous problems, we consider a white Gaussian source with a 
white Gaussian reference channel. Specifically, we model the 
source as an i.i.d. Gaussian sequence where each 5, has mean 
zero and variance <r|, and the independent reference channel 
noise as an i.i.d. sequence whose ?'th element N t has mean 
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zero and variance o 2 N . Furthermore, we adopt the quadratic 
distortion measure d(a,b) = (a — b) 2 . 

While our proofs in Section El exploited that our signals 
were drawn from finite alphabets and that all distortion 
measures were bounded to simplify our development, the 
results can be generalized to continuous-alphabet sources with 
unbounded distortion measures using standard methods. In the 
sequel, we assume without proof that the coding theorems hold 
for Gaussian sources with quadratic distortion. Since it appears 
difficult to obtain a closed-form expression for the optimal 
distribution for U, 1 we instead develop good inner and outer 
bounds on the boundary of the achievable distortion region. 


A. Unachievable Distortions: Inner Bounds 


To derive an inner bound, we ignore the requirement that 
reconstructions be authentic, i.e., satisfy 0 , and study the 
distortions possible in this case. 

For a given constraint on the power P input to the reference 
channel, it is well-known that the minimum possible source 
reconstruction distortion D r achievable from the output of 
the channel can be achieved without either source or channel 
coding in this Gaussian scenario, and the resulting distortion 
is 


Dr = 


a2 N a S 

<J 2 n + P' 


(60) 


Moreover, for a scheme with encoding distortion I),,, the 
Cauchy-Schwarz inequality implies that P is bounded accord¬ 
ing to 


P = E[X 2 } = E[{X -S + S') 2 ] = E[(X - S) 2 ] + E[S 2 ] 
+ 2 E[{X - S)S\ <D e + a 2 s + 2 \JD e a 2 s , (61) 

where equality holds if and only if X = ^1 + \fW e m) s - 
Thus, substituting (ED into ED yields the inner bound 


o/v + (V D e + os) 


B. Achievable Distortions: Outer Bounds 

To derive outer bounds we will consider codebooks where 
(S, U, X) are jointly Gaussian. Since it is sufficient to consider 
X to be a deterministic function of U and S, the innovations 
form 

T ~ N(0, c4), E[TS} = 0 (63a) 

U = aS + cT (63b) 

X = bU + dT (63c) 

conveniently captures the desired relationships . 8 We examine 
two regimes: a low D e regime in which we restrict our 
attention to the parameterization ( a,b,c,d ) = (1,1,1/a, 1), 
and a high D e regime in which we restrict our attention to 
the parameterization (a,b,c,d) = (l,/3,1,0). As we will see, 
time-sharing between these parameterizations yields almost 
the entire achievable distortion region for Gaussian codebooks. 

7 An analysis using calculus of variations suggests that the optimal distri¬ 
bution is not even Gaussian. 

8 It can be shown that choosing either a = 1 or c = 1 incurs no loss of 
generality. 


Low D e Regime: We obtain an encoding that is asymptot¬ 
ically good at low D e by using a distribution with structure 
similar to that used to achieve capacity in the related problem 
of information embedding [20]. In the language of [26], the 
encoding process involves distortion-compensation. In partic¬ 
ular, the source is amplified by a factor 1 /a, quantized to the 
nearest codeword, attenuated by a, and then a fraction of the 
resulting quantization error is added back to produce the final 
encoding, i.e., 

X n = aQ[S n /a] + (1 - a)(S n - aQ[S n /a\) (64) 

where Q[-] denotes the quantizer function. 

With this encoding structure, it is convenient to make the 
assignment U n = aQ[S n /a], so that we may write 

U = S + T/a (65) 

X = U + (l-a)(S-U) = S + T ( 66 ) 


where T is a Gaussian random variable with mean zero 
and variance erf, independent of both the source S and the 
reference channel noise N. 

We choose g(-) to be the minimum mean-square estimate 
of S given U. Thus the resulting distortions are, via ED and 
(1661 ). 

D e = E[(X - S') 2 ] = E[(S + T - S') 2 ] = 4 (67) 


and, in turn. 


D r = E[S 2 


1 - 


E[SU] 2 

E[S 2 )E[U 2 


^s( a T + oral) - a a s 


Op + CSL Z O s 


CgDe 


D e + a 2 o 2 


( 68 ) 


's 

To show that distortions d67l and d 68 l are achievable 
requires proving that ( l9at holds. In [20], the associated dif¬ 
ference of mutual informations is computed (using slightly 
different notation) as 


I(U-,Y)-I(S ; U) = 

1 o 2 r (o 2 r + (T§ + ojv) 

2 a 2 cj|( 1 - a) 2 + o 2 n (o\ + a 2 a|) 


(69) 


which implies that to keep the difference of mutual informa¬ 
tions nonnegative we need 


otp[oq^o^o^) P Oj^Os( 1 a) 2 +o^(o^ + a 2 Os)- (70) 


Collecting terms in powers of a yields 

„2/22, 2 2 \ r> 22 4 / 

QL yCJr-pCJg + 2ot(J'j'(J cj (Jrj-i — yQ. 

where 


r + )(a — r_) < 0 

(71) 


_ 1 + V 1 + Or/Os + a N/ a s ^ n 

' + - 0/0 _ U 


1 T tj jy / Orp 

1 - V 1 + v ' t / v's + °tv/o-] 


1 + o 2 n !o\ 


< 0 . 


(72) 


(73) 


Therefore to satisfy the mutual information constraint we need 

r_ < a < r+. 
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To minimize the distortions, ( I68b and Wl\ imply we want 
|a| as large as possible subject to the constraint M li t. Thus we 
choose a = r+, from which we see that 


C^auth 



(74) 


where ai e = o\l(o\ + a^) is the corresponding information 
embedding scaling parameter determined by Costa [20]. Ev¬ 
idently, the scaling parameter for the authentication problem 
is at least twice the scaling for information embedding and 
significantly larger when either the SNR ctg/aff or signal-to- 
(encoding)-distortion ratio (SDR) cr|/cr|, is small. 

High D e Regime: An encoder that essentially amplifies the 
quantization of the source to overcome the reference channel 
noise is asymptotically good at high D e . A system with 
this structure corresponds to choosing the encoder random 
variables according to 


U = S + T (75) 

X = /3U. (76) 


In turn, choosing as g(-) the minimum mean-square error 
estimator of S given U yields the distortions 


D e = (1 - P) 2 v 2 s + P 2 Vt 


2 ^2 


(T oCr; 


S U T 


D r = 


It remains only to determine (3. Since 


ms) = \ io g ^^ 


and 


j t 

l 2(„2 , —2 


HU;Y) = + 

Z (7 


N 


the mutual information constraint (FJat implies that 

P> 


2 2 


+ a r)' 


(77) 

(78) 

(79) 

(80) 

(81) 


C. Comparing and Interpreting the Bounds 

Using <1681 with a given by <E3 and varying erf. yields 
one outer bound. Using <E7} and Hi with U} and again 
varying Oj, yields the other outer bound. The lower convex 
envelope of this pair of outer bounds is depicted in Fig. [5] 
at different SNRs. To see that the first and second outer 
bounds are asymptotically the best achievable for low and high 
D e , respectively, we superimpose on these figures the best 
Gaussian codebook performance, as obtained by numerically 
optimizing the parameters in J63> . 

By using (I62t . (I68t . and J78> . it is possible to show that 
for any fixed D e > ct 2 N the inner and outer bounds converge 
asymptotically in SNR in the sense that 

.. -Dr,outer -i 

lim —— : - = 1 

SNR ► oo inner 

where D r i nner and D ro uter represent the inner and outer bounds 
corresponding to the fixed value of D e . Thus, in this high SNR 


regime, Gaussian codebooks are optimal, and ( 1621) accurately 
characterizes their performance as reflected in Fig. [5] 

The figure also indicates (and it is possible to prove) that 
for any fixed SNR, the inner and outer bounds converge 
asymptotically in D e in the sense that 

D r outer(D e ) 

lim - 1 ---- = 1 

D e —>o o D rj inner(D e J 

where D r} i nnel (D e ) and D rj 0 u ter(D e ) represent the inner and 
outer bounds as a function of the encoding distortion D e . 
Evidently in this high encoding distortion regime, D r /aj^ can 
be made arbitrarily small by using Gaussian codebooks and 
making D e /a 2 N sufficiently large. While this implies that, in 
principle, there is no fundamental limit to how small we can 
make D r by increasing D e through amplification of the source, 
in practice secondary effects not included in the model such 
as saturation or clipping will provide an effective limit. 

Finally, note that the cost of providing authentication is 
readily apparent since the inner bound from <1621 represents 
the distortions achievable when the reconstruction need not 
be authentic. Since for a fixed SNR, the bounds converge 
asymptotically for large D e , and for a fixed D e > a 2 N the 
bounds converge asymptotically for large SNR, we conclude 
that the price of authentication is negligible in these regimes. 
However, for low D e regimes of operation, requiring authen¬ 
ticity strictly reduces the achievable distortion region. This 
behavior is analogous to that observed in the binary-Hamming 
case. 

VIII. Comparing Authentication Architectures 

The most commonly studied architectures for authentication 
are robust watermarking (i.e., self-embedding) and fragile 
watermarking. In the sequel we compare these architectures 
to that developed in this paper. 

A. Authentication Systems Based on Robust Watermarking 

The robust watermarking approach to encoding for au¬ 
thentication (see, e.g., [4], [10], [11], [15], [16]) takes the 
form of a quantize-and-embed strategy. The basic steps of the 
encoding are as follows. First, the source S n is quantized 
to a representation in terms of bits using a source coding 
(compression) algorithm. Second the bits are protected us¬ 
ing a cryptographic technique such as a digital signature or 
hash function. Finally, the protected bits are embedded into 
the original source using an information embedding (digital 
watermarking) algorithm. At the decoder, the embedded bits 
are extracted. If their authenticity is verified via the appropriate 
cryptographic technique, a reconstruction of the source is 
produced from the bits. Otherwise, the decoder declares that 
an authentic reconstruction is not possible. 

It is straightforward to develop the information-theoretic 
limits of such approaches, and to compare the results to 
the optimum systems developed in the preceding sections. In 
particular, if we use optimum source coding and information 
embedding in the quantize-and-embed approach, it follows 
that, in contrast to Theorem^ the distortion pair (D e , l) r ) lies 
in the achievable distortion region for a quantize-and-embed 
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Fig. 5. Bounds on the achievable distortion region for the Gaussian-quadratic problem. The lowest solid curve is the inner bound corresponding to the 
boundary of the achievable region when reconstructions need not be authentic. The numerically obtained upper solid curve is the outer bound resulting from 
the use of Gaussian codebooks. The dashed curve corresponds to the lower convex envelope of the simple low and high D e analytic outer bounds derived in 
the text. 


structured solution to the problem 0 if and only if there exists 
distributions p(s|s) andp(u|s), and a function /(-, •), such that 


while the information embedding capacity is (see [36]) the 
upper concave envelope of the function 


I(U;Y)-I(S;U)>I(S-,S) 
E[d e (S,f(U,S))] < D e 
E[d T (S, 5)] < D r . 


(82a) 

(82b) 

(82c) 


ffp(-De) — 


i.e., 


These results follow from the characterization of the rate- 
distortion function of a source [43] and the capacity of 
information embedding systems with distortion constraints as 
developed in [36] as an extension of [19]. 

Comparing (1821 to i|9} with S = g(U) we see that 
quantize-and-embed systems are unnecessarily constrained, 
which translates to a loss of efficiency relative to the optimum 
joint source-channel-authentication coding system construc¬ 
tions of Section El This performance penalty can be quite 
severe in the typical regimes of interest, as we now illustrate. 
In particular, we quantify this behavior in the two example sce¬ 
narios considered earlier: the binary-Hamming and Gaussian- 
quadratic cases. 

1) Example: Binary-Hamming Case: In this scenario, the 
rate-distortion function is [43] 


C(D e ) = 


0 , if 0 < d < p, 

h(D e ) - h{p), if p<D e < 1/2, 


9p(£p)_ D if 0 < D e < D p , 
D p ’ “ p 

, g P (D e ), if Dp < D e < 1/2, 


(84) 


(85) 


R(D r ) = 1 - h(D r ), 


(83) 


where D p = 1 — 2~ h ^ p \ Equating R in (l83l to C in d85t . 
we obtain a relation between D r and D e . This curve is 
depicted in Fig. [6] for different reference channel parame¬ 
ters. As this figure reflects, the optimum quantize-and-embed 
system performance lies strictly inside the achievable region 
for the binary-Hamming scenario developed in Section eh 
with the performance gap largest for the cleanest reference 
channels. Moreover, since as we saw in Section m clean 
reference channels correspond to ensuring small encoding and 
reconstruction distortions, this means that quantize-and-embed 
systems suffer the largest losses precisely in the regime one 
would typically want to operate in. 
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Fig. 6. Performance loss of quantize-and-embed systems for the Binary-Hamming scenario with various reference channel crossover probabilities p. The 
solid curve depicts the boundary of the achievable regions for the optimum system; the dashed curve depicts that of the best quantize-and-embed system. 


2) Example: Gaussian-Quadratic Case: In this scenario, 
the rate-distortion function is [43] 


R(D r ) 


5 log If, 0 <Z) r <cr| 
0, D r > cr|, 


while the information embedding capacity is [20] 


( 86 ) 


inner bound m on the performance of the optimum system 
with that of quantize-and-embed, i.e., d88l) . we see that while 
quantize-and-embed incurs no loss at low SNR: 


D? e 


as 


(To 

-f -> 0 , 

a N 


(89) 


C(D e ) = f log ( 1 + 

a N 


(87) 


Again, equating R in d86> to C in GB, we obtain the following 
relation between D r and D e for all D e > 0: 


Dr = 


(1 + D e /crpj) 


( 88 ) 


This curve is depicted in Fig. Q for different reference channel 
SNRs. This figure reflects that the optimum quantize-and- 
embed system performance lies strictly inside the achiev¬ 
able region for the Gaussian-quadratic scenario developed in 
Section m Likewise, the performance gap is largest for 
the highest SNR reference channels. Indeed, comparing the 


at high SNR the loss is as much as SNR/2 for D e > a%: 


° 2 n DT ; 1 < 1 

D r 1 + D e /cr% ~ 2 


(90) 


where we have used D^ e to denote the quantize-and-embed 
reconstruction distortion < 1881 . 

Hence, as in the binary-Hamming case, we see again that 
quantize-and-embed systems suffer the largest losses in the 
regime where one is most interested in operating — that where 
the editor is allowed to make only perturbations small enough 
that the corresponding encoding and reconstruction distortions 
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Fig. 7. Performance loss of quantize-and-embed systems for the Gaussian-quadratic scenario at various reference channel SNRs. The solid curve depicts the 
asymptotic outer bound of the achievable regions for the optimum system; the dashed curve depicts that of the best quantize-and-embed system. 


are small. 9 


B. Authentication Systems Based on Fragile Watermarking 

A fundamentally different approach to the authentication 
problems of this paper is based on constraining the semantic 
severity of the modifications the editor is allowed to make. 
In particular, given a distortion measure that captures the 
semantic impact of edits to the content, the decoder will 
declare the edited content authentic if and only if the distortion 
is below some predetermined threshold. We refer to these as 
authentication systems based on semantic thresholding. 

9 It should be emphasized that while one could argue that the quadratic dis¬ 
tortion measure is a poor measure of semantic proximity in many applications, 
such reasoning confuses two separate issues. We show here that quantize- 
and-embed systems are quite poor when the quadratic measure corresponds 
exactly to the semantics of interest. For problems where it is a poor match, 
one can expect systems based on more accurate measures to exhibit the 
same qualitative behavior — that quantize-and-embed systems will be least 
attractive in regimes where the source encodings and reconstructions are 
constrained to be semantically close to the original source. 


It is important to appreciate that the manner in which the 
editor is constrained in systems based on semantic threshold¬ 
ing is qualitatively quite different from the way the editor is 
constrained in the systems developed in this paper. In par¬ 
ticular, in our formulation, the editor is contrained according 
to a reference channel model that can be freely chosen — 
independently of any semantic model. 

While in this section we are primarily interested in dis¬ 
cussing the properties of such systems, we first briefly describe 
how such systems can be designed. We begin by noting that 
role of the encoder in such systems is to mark the original 
content so as to enable the eventual decoder to estimate the 
distortion between the edited content and that original content, 
despite not having direct access to the latter. 

One approach to such a problem would be to use the self¬ 
embedding idea discussed in Section IVIII-AI In particular, a 
compressed version of the original content would be embedded 
into that content so that it could be reliably extracted from 
the edited content by the decoder and used in the distortion 
calculation. In practice, such self-embedding can be somewhat 
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resource inefficient, much as it was in the context of Sec¬ 
tion NuFK\ Instead, an approach based on so-called fragile 
watermarking is more typically proposed, which allows the 
decoder to measure the distortion without explicitly being 
given an estimate of the original content. With this approach, 
distortion in the known watermark that results from editing 
the content are used to infer the severity of distortion in the 
content itself. 

Typical implementations of the fragile watermarking ap¬ 
proach to encoding for authentication (see, e.g., [5], [7], [13], 
[14]) take the following form. A watermark message M known 
only to the encoder and decoder (and kept secret from the 
editor) is embedded into the source signal by the encoder. The 
editor’s processing of the encoded content indirectly perturbs 
the watermark. A decoder extracts this perturbed watermark 
M, measures the size of the perturbation (e.g., by computing 
the distortion between M and M with respect to some suitable 
measure), then uses the result to assess the (semantic) severity 
of the editing the content has undergone. If the severity is 
below some predetermined threshold, the decoder declares the 
signal to be authentic. 

A detailed information-theoretic characterization of authen¬ 
tication systems based on semantic thresholding is beyond the 
scope of this paper. However, in the sequel we emphasize some 
important qualitative differences in the security characteristics 
between such schemes and those developed in this paper. 
In particular, as we now develop, there is a fundamental 
vulnerability in semantic thresholding schemes that results 
from their inherent sensitivity to mismatch in the chosen 
semantic model. 

To see this, consider a mismatch scenario in which the 
authentication system is designed with an incorrect semantic 
model (distortion measure). If the system is based on semantic 
thresholding, then an attacker who recognizes the mismatch 
can exploit this knowledge to make an edit that is semantically 
significant, but which the system will deem as semantically 
insignificant due to the model error, and thus accept as 
authentic. Thus, for such systems, a mismatch can lead to 
a security failure. 

By contrast, for the authentication systems developed in this 
paper, designing the system based on the incorrect semantic 
model reduces the efficiency of the system, but does not 
impact its security. In particular, use of the incorrect semantic 
model leads to encodings and/or authentic reconstructions with 
unnecessarily high distortions (with respect to the correct 
model). However, attackers cannot exploit this to circumvent 
the security mechanism, since they are constrained by the ref¬ 
erence channel, which is independent of the semantic model. 

From such arguments, one might conclude that systems 
based on semantic thresholding might be preferable so long as 
care is taken to develop accurate semantic models. However, 
such a viewpoint fails to recognize that in practice some degree 
of mismatch is inevitable — the high complexity of accurate 
semantic models makes them inherently difficult to learn. 
Thus, in a practical sense, authentication systems based on 
semantic thresholding are intrinsically less secure than those 
developed in this paper. 


IX. Layered Authentication: Broadcast 
Reference Channels 


For many applications, one might be interested in an 
authentication system with the property that an authentic 
reconstruction is always produced, but that its quality degrades 
gracefully with the extensiveness of the editing the content has 
undergone. In this section we show that discretized versions 
of such behavior are possible, and can be built as a natural 
extension of the formulation of this paper. 

To develop this idea, we begin by observing that the 
systems developed thus far in the paper represent a first- 
order approximation to such behavior. In particular, for edits 
consistent with the reference channel model, an authentic 
reconstruction of fixed quality is produced. When the edit¬ 
ing is not consistent with the reference channel, the only 
possible authentic reconstruction is the minimal quality one 
one obtained from the a priori distribution for the content, 
since the edited version must be ignored altogether. In this 
section, we show that by creating a hierarchy of reference 
channels corresponding to increasing amounts of editing, one 
can create multiple authentication reconstructions. In this way, 
a graceful degradation characteristic can be obtained to any 
desired granularity. 

Such systems can be viewed as layered authentication sys¬ 
tems, and arise naturally out of the use of broadcast reference 
channel models. With such systems there is a fixed encoding 
of the source that incurs some distortion. Then, from edited 
content that is consistent with any of the constituent reference 
channels in the broadcast model, the decoder produces an 
authentic reconstruction of some corresponding fidelity. Oth¬ 
erwise, the decoder declares that an authentic reconstruction 
is not possible. 

For the purpose of illustration, we focus on the two-user 
memoryless degraded broadcast channel [43] as our refer¬ 
ence channel. This corresponds to a two-layer authentication 
system. For convenience, we refer to the strong channel 
as the “mild-edit” one, and the weak channel, which is a 
degraded version of the strong one, as the “harsh-edit” one. 
Edits consistent with the mild-edit branch of the reference 
channel will allow higher quality authentic reconstructions, 
which we will call “fine,” while edits consistent with the harsh- 
edit branch will allow lower quality authentic reconstructions, 
which we will call “coarse”. For edits inconsistent with either 
branch, the only authentic reconstruction will be one that 
ignores the edited data, which will be of lowest quality. 

In this scenario, for any prescribed level of encoding 
distortion D e , there is a fundamental trade-off between the 
achievable distortions !)[ and //[: of the corresponding fine 
and coarse authentic reconstructions, respectively. Of course 
/.)[ : > I)\. will always be satisfied. However, as we will see, 
achieving smaller values of D £ in general requires accepting 
larger values of I)[ and vice-versa. Using the ideas of this 
paper, one can explore the fundamental nature of such trade¬ 
offs. 
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A. Achievable Distortion Regions 

The scenario of interest is depicted in Fig. [ 8 ] As a natural 
generalization of its definition in the single-layer context 0 , 
an instance of the layered authentication problem consists of 
the eight-tuple 

{§,p{s),X,^,p{y c \y i ),p(y { \x) 1 d e {-,-),d I {-, •)} , (91) 

where, since our reference channel is a degraded broadcast 
channel, the reference channel law takes the form 

p(v?,v?\x n ) =p(y?\y?)p(y?K)- m) 

Let S'! denote the (coarse) authentic reconstruction obtained 
when decoder input is consistent with the harsh-edit output of 
the reference channel, and let Sf denote the (fine) authen¬ 
tic reconstruction obtained when decoder input is consistent 
with the mild-edit output of the reference channel. In turn, 
the corresponding two reconstruction distortions are defined 
according to 


1 

D c r = -J2dr(S n ,Sf) 

^ 0 — 1 

(93a) 

1 — 1 

1 ” 

Dl = -Y,MS n ,S?). 

1 1=1 

(93b) 


The following theorem develops trade-offs between the 
encoding distortion D e , and the two reconstruction distortions 
m that are achievable. 

Theorem 2: The distortion triple {D ei DfD[) lies in 
the achievable distortion region for the layered authentica¬ 
tion problem ED if there exist distributions p(u,t\s) and 
p(x\u, t, s ), and functions g c (•) and gf (-, •) such that 


I(U-,Y C )-I(S-,U) >0 

(94a) 

I(T]Y f \U) - I(S; T\U) > 0 

(94b) 

E[d e {S,X)]<D e 

(94c) 

E[dfS,g c {U))}<D c r . 

(94d) 

E[d T {S,g ( (U,T))} <D[. 

(94e) 


In this theorem, the achievable distortion region is defined in 
a manner that is the natural generalization of that for single¬ 
layer systems as given in Definition [2| 

In the interests of brevity and since it closely parallels that 
for the single-layer case, we avoid a formal derivation of this 
result. Instead, we sketch the key ideas of the construction. 
We also leave determining the degree to which the distortion 
region can be further extended via more elaborate coding for 
future work. 

Proof: [Sketch of Proof:] 

First a codebook C c is created for the harsh-edit layer at 
rate R c = I(U; S) + 27 where only 2 n ^ Rc+ ^ codewords 
are marked as admissible as in Theorem [T) Then for each 
codeword c c € C c an additional random codebook Cf(c c ) of 
rate Rf = I(T; Sj U) + 27 is created according to the marginal 
distribution p(t\u) where only 2 n( - R{+1 ^ codewords are marked 
as admissible. 

The encoder first searches C c for an admissible codeword 
c c jointly typical with the source and then searches Cf(c c ) for 


a refinement Cf that is jointly typical with the source. The 
pair (c c ,Cf) is then mapped into the channel according to 
p(x\u,t,s). By standard arguments the encoding will succeed 
with high probability provided that R c > 1(11: S) and Rf > 
I(T-S\U). 

When the channel output is consistent with either output 
of the reference channel, the decoder locates an admissible 
codeword c c G e c jointly typical with the signal. If the 
signal is consistent with the harsh-edit output of the reference 
channel, in particular, the decoder then produces the coarse 
authentic reconstruction S'! = g c (c c ). However, if the signal 
is consistent with the mild-edit output of the reference channel, 
the decoder then proceeds to locate an admissible Cf G Cf (c c ) 
and produces the fine authentic reconstruction S'" = gf (c c , Cf). 

By arguments similar to those used in the single-layer case 
(i.e., proof of Theorem 0, this strategy achieves vanishingly 
small probabilities of successful attack, and when the reference 
channel is in effect meets the distortion targets provided that 
Rc < U.U ; Y c ) and R { < I(T ; Y f \U). 


B. Example: Gaussian-Quadratic Case 

The Gaussian-quadratic case corresponds to the mild- and 
harsh-edit outputs of the reference channel taking the forms 
Yf = X + N and Y c = Yf + V, respectively, where N and V 
are Gaussian random variables independent of each other, as 
well as S and X. 

For this case, a natural approach to the layered authen¬ 
tication system design has the structure depicted in Fig. [9] 
which generalizes that of the single-layer systems developed 
in Section ivm The encoder determines the codeword T n 
nearest the source S n , then perturbs T n so as to reduce 
the encoding distortion, producing the encoding X n . If the 
channel output stays within the darkly shaded sphere centered 
about T n , e.g., producing Y f n as shown, the decoder produces 
a fine-grain authentic reconstruction from T n . If the channel 
output is outside the darkly shaded sphere, but inside the 
encompassing lightly shaded sphere centered about U n , e.g., 
producing Yf as shown, the decoder produces a coarse-grain 
authentic reconstruction from U n . If the channel output is 
outside any shaded region, e.g., producing Z n , the decoder 
indicates that an authentic reconstruction is not possible. 

An achievable distortion region for this layered authentica¬ 
tion scenario is obtained from Theorem [2] with the auxiliary 
random variables chosen according to 


U = S + A/a 

(95) 

T = S + B/P 

(96) 

X = S + A + B. 

(97) 


where A and B are Gaussian random variables independent 
of S. Choosing g c (•) and gt (■. ■) to be the minimum mean- 
square error estimates of S from U and ( U,T ), respectively. 
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Fig. 8. Two-layer authentication system operation when the reference channel is in effect. From the outputs Yf and Y c of the degraded broadcast reference 
channel, corresponding to mild and harsh editing, the respective fine and coarse authentic reconstructions S™ and S™ are produced. The common encoding 
obtained from the source S n is X n . 



Fig. 9. Illustration of the nested codebook geometry associated with a two-layer authentication system for the Gaussian-quadratic scenario. The centers of 
large and small shaded spheres correspond to admissible coarse and fine authentic reconstructions, respectively. 
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yields 


D e — (j\ + <Tg 


D c r = 


= - 2 S 


1- 


E[SU ] 2 
E[S 2 ]E[U 2 

-l 




a 2 , + a 2 al 


D r = a S ~ A S,[t/T]A [l7r] A[ UT ] ig 


(98) 

(99) 

( 100 ) 


where A with a single subscript denotes the covariance of 
its argument, and A with a subscript pair denotes the cross¬ 
covariance between its arguments. 

To produce 5", a decoder essentially views B as additive 
channel noise. Therefore, we can immediately apply the argu¬ 
ments from Section fVII-BI to obtain 


I(U-,Y C ) - I(S-,U) = 


+ (t's + a N + a V + a 


I) 


2 108 - «) 2 + {<*% + < J v + + « 2cr I) ’ 

( 101 ) 

From this we can solve for a as in the single-layer case of 
Section ED by simply replacing o\ and a\, with a\ and 
+ a v + a %- respectively, in M2\ . 

Finally, since 


I{S- T\U) - J(T; Y f \U) = H(T\U, Y f ) - H(T\U, S ) 

= H{T,U,Yi)~ H(U,Y { ) 

-H{T,U,S) + H{U,S). (102) 


see that J94hl implies 


det(A[ T [/y f ]) det(A[ Tt/ 5]) 
det(A[[7i' f ]) “ det(A[£/s]) 


(103) 


By varying cr^, and (3 such that <1031 is satisfied we 
can trace out the volume of an achievable distortion region. 
Fig. m shows slices of this three dimensional region by 
plotting the fine and coarse reconstruction distortions !)\ and 
D £ for various values of the encoding distortion D e . Note 
that it follows from our single-layer inner bounds that for a 
particular choice of encoding distortion D e , the achievable 
trade-offs between f)' T and I)\. are contained within the region 


D c r > 


( a N + 


4 ) 


T)[ > 


O’pj + Oy + {\/Dl + 0's) 

2 2 
o s o N 


(104) 


(105) 


O'n + (v/^e + 0’s) 

where obviously the lower bound of dl05l is smaller than that 

of im. 

A simple alternative to the layering system for such au¬ 
thentication problems is time-sharing, whereby some fraction 
of time the encoder uses a codebook appropriate for the 
harsh-edit reference channel, and for the remaining time uses 
a codebook appropriate for the mild-edit reference channel. 
When the harsh-edit reference channel is in effect, the decoder 
produces the coarse authentic reconstruction for the fraction of 
time the corresponding codebook is in effect and produces zero 
the rest of the time. When the mild-edit reference channel is in 


effect, the decoder produces the fine authentic reconstruction 
during the fraction of time the corresponding codebook is 
in effect, and produces the coarse reconstruction for the 
remaining time (since the broadcast channel is a degraded 
one). However, as Fig. nm also illustrates, this approach is in 
general quite inefficient: the use of such time-sharing results 
in a substantially smaller achievable region. 

X. Concluding Remarks 

This paper develops one meaningful formulation for au¬ 
thentication problems in which the content may undergo a 
variety of types of legitimate editing prior to authentication. 
As part of this formulation, we adopt a particular formal 
notion of security in such settings. For such a formulation, and 
with the simplest classes of models, we establish that secure 
authentication systems can be constructed, and subsquently 
analyze their fundamental performance limits. From these 
models, we further develop how such systems offer significant 
advantages over other proposed solutions. 

Many opportunities for further research remain. For exam¬ 
ple, extensions of the main results to richer content, semantic, 
and edit models may provide additional insights into the 
behavior of such sysems. It would also be useful to understand 
the degree to which robust and/or universal solutions exist for 
the problem; such approaches seek to avoid requiring accurate 
prior model knowledge during system design. 

There are additional opportunities to further refine the anal¬ 
ysis even for the existing models. For example, characterizing 
the manner in which asymptotic limits are approached — 
for example via error exponents — would provide useful 
engineering insights. Likewise, further analyzing public-key 
formulations, in which edits are more generally subject to 
computational constraints, could also be revealing. From this 
persective, the Appendix represents but a starting point. 

More generally, identifying and relating other meaningful 
notions of security for such problems would be particularly 
useful in putting the results of this paper in perspective. For 
example, a broader unifying framework for characterizing 
and comparing different notions of security could provide a 
mechanism for selecting a formulation best matched to the 
social needs and/or engineering constraints at hand. 

Finally, there are many interesting questions about how 
to best approach the development of practical authentication 
systems based on these ideas. These include questions of cus¬ 
tomized code design and implementation, but also architectural 
issues concerning the degree to these systems can be built 
from interconnections of existing and often standardized com¬ 
ponents — i.e., existing compression systems, error-control 
codes, and public-key cryptographic tools. 

Appendix 

A Public-Key Adaptation of the Private-Key 
Authentication System Model 

To simplify the analysis we have focussed on private key 
systems where the encoder and decoder share a secret key 6 , 
which is kept hidden from editors. In most practical applica¬ 
tions, however, it is more convenient to use public key systems 
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Fig. 10. Achievable fine and coarse quality reconstruction distortion pairs ( J)[. D£) in a layered authentication system for the Gaussian-quadratic case 
with cr'g/a^j = 30 dB, cry/o^ = 10 dB, and = 1. From left to right, the curves are the boundaries of achievable distortion regions corresponding to 
encoding distortions of D e /cr ^ = 10, 5, 0, —5, —10 dB. The dashed curve corresponds to time-sharing between two operating points on the D e /a^ = 0 
dB curve. 


where a public key 9 p is known to all parties (including 
editors) while a signing key, 9 S , is known only to the encoder. 
The advantage of public key systems is that while only the 
encoder possessing 9 S can encode, anyone possessing 9 p can 
decode and verify a properly encoded signal. In this section, 
we briefly describe how a secret key authentication system 
can be combined with a generic digital signature scheme to 
yield a public key system. Some additional aspects of such an 
implementation are discussed in, e.g., [49], [50]. 

A digital signature scheme consists of a signing function 
r = S(rn, 9 s ) and verifying function V(m, r, 9 P ). Specifically, 
the signing function maps an arbitrary length message m 
to a 7 bit tag r using the signing key 9 S . The verifying 
function returns true (with high probability) when given a 
message, public key, and tag generated using the signing 
function with the corresponding signing key. Furthermore, it 
is computationally infeasible to produce a tag accepted by 
the verifier without using the signing key. Many such digital 
signature schemes have been described in the cryptography 
literature where r requires a number of bits that is sub-linear 
in n or even finite. 

Modified Encoder: 

1) The public key of the digital signature scheme is pub¬ 


lished, and there is no secret key (equivalently, the secret 
key in the our original formulation is simply published). 

2) The encoder uses the original authentication system to 
map the source S n to X n = T n (S n ). 

3) For a system like the one described in Section lV-AI there 
are a finite number of possible values for the authentic 
reconstruction S n and the authentic reconstruction is a 
deterministic function of S n . Thus each reconstruction 
can be assigned a bitwise representation c(S n ), from 
which the encoder computes the digital signature tag 
r = S(c(5 rl ), 9 S ) using the digital signature algorithm. 

4) Finally the signature r is embedded into X n , produc¬ 
ing X n , using an information embedding (data hiding) 
algorithm. The chosen algorithm can be quite crude 
since r only requires a sub-linear number of bits. The 
algorithm parameters are chosen to that the embedding 
incurs asymptotically negligible additional distortion to 
the overall encoding process. 

Modified Decoder: 

1) The decoder extracts from Y n an estimate f of the 
embedded signature r. Since the size of r is sub-linear, 
the embedding algorithm parameters can be further 
chosen so that r = r with arbitrarily high probability 
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when the reference channel is in effect. 

2) Next, the decoder uses the original authentication system 
to produce S n = <1>„ (V' n ), and then, in turn, its bitwise 
representation c(S n ). 

3) The decoder checks whether the digital signature veri¬ 
fying algorithm V(c(S n ) : f, 9 p ) accepts the S n as valid. 

4) If so, then the decoder produces the authentic recon¬ 
struction S n = S". Otherwise, the decoder produces 
the special symbol 0, declaring that it is unable to 
authenticate. 

With this construction, we see that the security of such a 
system is determined by the security of the underlying public- 
key digital signature scheme used. Specifically, the only way 
an attacker can defeat the system is to find a matching S n 
and r accepted by the digital signature verifying algorithm. 
All other performance aspects of the system are effectively 
unchanged. 
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